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(54) Title: CHROMOSOME-WIDE ANALYSIS OF PROTEIN-DNA INTERACTIONS 



(57) Abstract: The present invention relates to a method of identifying a region (one or more) of a genome of a cell to which a 
protein of interest binds. In the methods decribed herein, DNA binding protein of a cell is linked (e.g. covulently crosslinked) lo 
genomic DNA of a cell. The genomic DNA lo which the DNA binding protein is linked is removed and combined or contacted 
with DNA comprising a sequence complementary to genomic DNA of the cell under conditions in which hybridization between the 
identified genomic DNA and the sequence complementary lo genomic DNA occurs. Region(s) of hybridization arc rcgion(s) of the 
genome of the cell to which the protein binds. 
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CHROMOSOME-WIDE ANALYSIS OF PROTEIN-DNA INTERACTIONS 

BACKGROUND OF THE INVENTION 

Many proteins involved in regulating genome expression, chromosomal 
replication and cellular proliferation function through their ability to bind speciiBc 
5 sites in the genome. Transcriptional activators, for example, bind to specific 
promoter sequences and recruit chromatin modifying complexes and the 
transcription apparatus to initiate RNA synthesis. The remodeling of gene 
expression that occurs as cells move through the cell cycle, or when cells sense 
changes in their environment, is effected in part by changes in the DNA-binding 

10 status of transcriptional activators. Distinct DNA-binding proteins are also 

associated with centi-omeres, telomeres, and origins of DNA replication, where they 
regulate chromosome replication and maintenance. Although considerable 
knowledge of many fundamental aspects of gene expression and DNA replication 
has been obtained from studies of DNA-binding proteins, an understanding of these 

1 5 proteins and their functions is limited by our knowledge of their binding sites in the 
genome. 

Proteins which bind to a particular region of DNA can be detected using 
known methods. However, a need exists for a method which allows examination of 
the binding of proteins to DNA across the entire genome of an organism. 

20 SUMMARY OF THE INVENTION 

The present invention relates to a method of identifying a region (one or 
more) of a genome of a cell to which a protein of interest binds. In the methods 
described herein, DNA binding protein of a cell is linked (e.g., covalently 
crosslinked) to genomic DNA of a cell. The genomic DNA to which the DNA 

25 binding protein is linked is identified and combined or contacted with DNA 

comprising a sequence complementary to genomic DNA of the cell (e.g,, all or a 
portion of a cell's genomic DNA such as one or more chromosome or chromosome 
region) under conditions in which hybridization between the identified genomic 
DNA and the sequence complementary to genomic DNA occurs. Region(s) of 
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hybridization axe region(s) of the genome of the cell to which the protein of interest 
binds. The methods of the present invention are preferably performed using living 
cells. 

In one embodiment, proteins which bind DNA in a cell are crosslinked to the 
5 cellular DNA. The resulting mixture, which includes DNA bound by protein and 
DNA which is not bound by protein is subject to shearing conditions. As a result, 
DNA fragments of the genome crossUnked to DNA binding protein are generated 
and the DNA fragment (one or more) to which the protein of interest is bound is 
removed from the mixture. The resulting DNA fragment is then separated from the 

10 protein of interest and amplified, using known methods. The DNA fragment is 

combined with DNA comprising a sequence complementary to genomic DNA of the 
cell, under conditions in which hybridization between the DNA fragment and a 
region of the sequence complementary to genomic DNA occurs; and the region of 
the sequence complementary to genomic DNA to which the DNA fragment 

15 hybridizes is identified. The identified region (one or more) is a region of the 
genome of the cell, such as a selected chromosome or chromosomes, to which the 
protein of interest binds. 

In a particular embodiment, the present invention relates to a method of 
identifying a region of a genome (such as a region of a chromosome) of a cell to 

20 which a protein of interest binds, wherein the DNA binding protein of the cell is 
crossUnked to genomic DNA of the cell using fonmaldehyde. DNA fragments of the 
crosslinked genome are generated and the DNA fragment to which the protein of 
interest is bound is removed or separated from the mixture, such as through 
immunoprecipitation using an antibody that specifically binds the protein of interest. 

25 This results in separation of the DNA-protein complex. The DNA fragment in the 
complex is separated from the protein of interest, for example, by subjecting the 
complex to conditions which reverse the crossHnlcs. The separated DNA fragment is 
amplified using ligation-mediated polymerase chain reaction (LM-PCR), and then 
fluorescently labeled. The labeled DNA fragment is contacted with a DNA 

30 microarray comprising a sequence complementary to genomic DNA of the cell, 
under conditions in which hybridization between the DNA fragment and a region of 
the sequence complementary to genomic DNA occurs. The region of the sequence 
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complementary to genomic DNA to which the DNA fragment hybridizes is 
identified by measming fluorescence intensity, and the fluorescence intensity of the 
region of the sequence complementary to genomic DNA to which the DNA 
fragment hybridizes is compared to the fluorescence intensity of a control. 
5 Fluorescence intensity in a region of the sequence complementary to genomic DNA 
which is greater than the fluorescence intensity of the control in that region of the 
sequence complementary to genomic DNA marks the region of the genome in the 
cell to which the protein of interest binds. 

Also encompassed by the present invention is a method of detemiining a 

10 ftinction of a protein of interest which binds to the genomic DNA of a cell. In this 
method, DNA binding protein of the cell is crosslinked to the genomic DNA of the 
cell. DNA fragments of the genome crosslinked to DNA binding protein are then 
generated, as described above, and the DNA fragment (one or more) to which the 
protein of interest is bound is removed from the mixture. The resulting DNA 

1 5 fragment is then separated from the protein of interest and amplified. The DNA 
fragment is combined with DNA comprising a sequence complementary to genomic 
DNA of the cell, under conditions in which hybridization between the DNA 
fragment and a region of the sequence complementary to genomic DNA occurs; and 
the region of the sequence complementary to genomic DNA to which the DNA 

20 fragment hybridizes is identified. This identified region is a region of the genome of 
the cell to which the protein of interest binds. The identified region is characterized 
and the characteristic of the identified region indicates the fimction of the protein of 
interest a regulatory protein such as a transcription factor; an oncoprotein). 
The present invention also relates to a method of determining whether a 

25 protein of interest which binds to genomic DNA of a cell fimctions as a transcription 
factor. In one embodiment, DNA binding proteia of the cell is crosslinked to the 
genomic DNA of the cell. DNA fragments of the crosslinked genome are generated 
and the DNA fragment to which the protein of interest is bound is removed from the 
naixture. The resulting DNA fragment is separated from the protein of interest and 

30 amphfied. The DNA fragment is combined with DNA comprising a sequence 
complementary to genomic DNA of the cell, under conditions in which 
hybridization between the DNA fragment and a region of the sequence 
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complonentary to genomic DNA occurs. The region of the sequence 
complementary to genomic DNA to which the DNA iBragments hybridizes is 
identified; wherein if the region of the genome is a regulatory region, then the 
protein of interest is a transcription factor. 
5 The methods described herein facilitate the dissection of the cells regulatory 

network of gene expression across the entire genome and aid in the identification of 
gene fimction. 



BRIEF DESCRIPTION OF THE DRAWINGS 

The file of this patent contains at least one drawing executed in color. 
10 Copies of this patent with color drawing(s) will be provided by the Patent and 
Trademark Office upon request and payment of the necessary fee. 

Figure 1 is an illustration of the Genome-wide Monitoring Protein-DNA 
interactions described herein. 

Figure 2 shows how the relative binding of the protein of interest to each 
15 sequence represented on an array was calculated using a weighted average analysis. 

Figure 3 is a graph of chromosomal position versus fold change of Genome- 
wide Monitoring Protein-DNA interactions. 

Figure 4 is a graph of chromosome position versus ratio of tagged to 
untagged for binding of ORG 1 to yeast chromosome HI. 
20 Figure 5 A is an example of a scanned image. The unenriched and IP 

enriched DNA generates green fluorescence and red fluorescence respectively. The 
close-up image shows examples of spots for which the red intensity is over- 
represented, indicating binding of the targeted protein to these DNA sequences. 

Figure 5B show that small amovmts of DNA can be quantitatively ampUfied 
25 and labeled with Cy3 and Cy5 fluorophores. Cy3- and Cy5-labeled DNA from 1 ng 
of yeast genomic DNA was prepared using the LM-PCR method described in the 
text. The resulting DNA samples were mixed and hybridized to a yeast intragenic 
DNA microarray. Low intensity spots have larger variations than high intensity 
spots, probably due to background noise. 
30 Figure 6 A shows the set of 24 genes whose promoter regions are most likely 

to be bound by Gal4 by the analysis criteria described herein. 
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Figure 6B is a schematic of the Gal4 binding intergenic regions. 

Figure 6C shows the results of conventional ChIP analysis. 

Figure 6D shows the results of the AlignAce program used to identify a 
consensus binding site for the Gal4 activator. 
5 Figure 6E is a bar graph showing relative expression of PLCl 0 and MTHl . 

Figure 6F is a schematic illustrating how the identification of MTHl and 
MTH, PCLIO and FUR4 as Gal4-regulated genes reveals how several different 
metabolic pathways are interconnected. 

Figure 7 lists the set of genes whose promoter regions are most likely to be 
10 bound by Stel 12 by the analysis criteria described herein. 

DETAILED DESCRIPTION OF THE INVENTION 

Understanding how DNA-binding proteins control global gene expression, 
chromosomal replication and cellular proliferation would be facilitated by 
identification of the chromosomal locations at which these proteins function in vivo. 

15 Described herein is a genome-wide location profihng method for DNA-bouud 
proteins, which has been used to monitor dynamic binding of gene-specific 
transcription factors and components of the general transcription apparatus in yeast 
cells. The genome-wide location method correctly identified known sites of action 
for the transcriptional activators Gal4 and Stel2 aud revealed unexpected fiinctions 

20 for these activators. The combination of expression and location profiles identified 
the global set of genes whose expression is under the direct control of specific 
activators and components of the ti*anscription apparatus as cells responded to 
changes in their extracellular environment. Genome-wide location analysis provides 
a powerfiil tool for fiarther dissecting gene regulatory networks, annotating gene 

25 fimctions and exploring how genomes are replicated. 

Accordingly, the present invention provides methods of examining the 
binding of proteins to DNA across the genome {e.g., the entire genome or a portion 
thereof, such as one or more chromosomes or a chromosome regions) of an 
organism. In particular, the present invention relates to a method of identifying a 

30 region (one or more) of genomic DNA of a cell to which a protein of interest binds. 
In one embodiment, proteins which bind DNA in a cell are crosslioked to the 
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cellular DNA. The resulting mixture, which includes DNA bound by protein and 
DNA which is not bound by protein is subject to shearing conditions. As a result, 
DNA fragments of the genome crosslinked to DNA binding protein are generated 
and the DNA fragment (one or more) to which the protein of interest is boimd is 
5 removed from the mixture. The resulting DNA fragments are then separated from 
the protein of mterest and amplified using known techniques. The DNA fragment is 
then combined with DNA comprising a sequence complementary to genomic DNA 
of the cell, under conditions in which hybridization between the DNA fragments and 
the sequence complementary to genomic DNA occurs; and the region of the 

10 sequence complementary to genomic DNA to which the DNA fragment hybridizes 
is identified. The identified region is a region of the genome of the cell to which the 
protein of interest binds. 

Also encompassed by the present invention is a method of detemiining a 
fimction of a protein of interest which binds to the genomic DNA of a cell. In this 

15 method, DNA binding protein of the cell is crosslinked to the genomic DNA of the 
cell. DNA fragments of the genome crosslinked to DNA binding protein are then 
generated, as described above, and the DNA fragment (one or more) to which the 
protein of interest is bound is removed. The resulting DNA fragment is then 
separated from the protein of interest and amplified. The DNA fragment is then 

20 combined with DNA comprising a sequence complementary to genomic DNA of the 
cell, under conditions m which hybridization between the DNA fragment and a 
region of the sequence complementary to genomic DNA occurs; and the region of 
the sequence complementary to genomic DNA to which the DNA fragment 
hybridizes is identified and is a region of the genome of the cell to which the protein 

25 of interest binds. The identified region is characterized {e.g., a regulatory region) 
and the characteristic of the identified region indicates a fimction of the protein of 
interest (e.g., sl transcription factor; an oncoprotein). 

The present invention also relates to a method of determining whether a 
protein of interest which binds to genomic DNA of a cell fimctions as a transcription 

30 factor. In one embodiment, DNA binding protein of the cell is crosslinked to 
genomic DNA of the cell and DNA fragments of the crosslinked genome are 
generated. The DNA fragment to which the protein of interest is bound are 
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removed. The resulting DNA fragment is separated from the protein of interest and 
amphfied. The DNA fragment is combined with DNA comprising a sequence 
complementary to genomic DNA of the cell, under conditions in which 
hybridization between the DNA fragments and sequence complementary to genomic 
5 DNA occurs. The region of the sequence complementary to genomic DNA to which 
the DNA fragments hybridizes is identified wherein if the region of the genome is a 
regulatory region, then Ihe protein of interest is a transcription factor. 

The methods of the present invention can be used to examine and/or identify 
DNA binding of proteins across the entire genome of a eulcaryotic organism. For 

10 example, DNA binding proteins across the entire genome of eukaryotic organisms 
such as yeast, Drosophila and humans can be analyzed. Alternatively, they can be 
used to examine and/or identify DNA binding of proteins to an entire chromosome 
or set of chromosomes of interest. 

A variety of proteins which bind to DNA can be analyzed. For example, any 

15 protein involved in DNA replication such as a transcription factor, or an oncoprotein 
can be examined in the methods of the present invention. 

There are a variety of methods which can be used to link DNA binding 
protein ofthe cell to the genome of the cell. For example, UV light can be used. In 
a particular embodiment, formaldehyde is used to crosslink DNA binding proteins to 

20 the genomic DNA of a cell. 

In the methods of the present invention, identification of DNA fragments 
boimd to the protein of interest can be removed from the mixture comprising DNA 
fragment(s) bound to the protein of interest and DNA fragments which are not 
boimd to the protein of interest, using a variety of methods. For example, 

25 immunoprecipitation using an antibody (e.g., polyclonal, monoclonal) or antigen 
binding fragment thereof which binds (specifically) to the protein of interest, can be 
used. In addition, tlie protein of interest can be labeled or tagged using, for example, 
an antibody epitope (e,g., hemagglutinin (HA)). 

The DNA fragments in the methods described herein can be amphfied using, 

30 for example, ligation-mediated polymerase chain reaction {e.g., see Current 

Protocols in Molecular Biology^ Ausubel, F.M. et aL, eds. 1991, the teachings of 
which are incorporated herein by reference). 
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The DNA comprising the complemeat sequence of the genome of the cell 
can be combined with the isolated DNA fragment to which the protein of interest 
binds using a variety of methods. For example, the complement sequence can be 
immobilized on a glass slide {e.g.. Coining Microaxray Technology (CMT™) 
5 GAPS^^) or on a microchip. Conditions of hybridization used in the methods of the 
present invention include, for example, high stringency conditions and/or moderate 
stringency conditions. See e.g., pages 2.10.1-2.10.16 (see particularly 2.10.8-11 ) 
and pages 6.3. 1-6 in Current Protocols in Molecular Biology), Factors such as 
probe length, base composition, percent mismatch between the hybridizing 

10 sequences, temperature and ionic strength influence the stability of hybridization. 
Thus, high or moderate stringency conditions can be determined empirically, and 
depend in part upon the characteristics of the known nucleic acids (DNA, RNA) and 
the other nucleic acids to be assessed for hybridization thereto. 

The methods of the present invention can further comprise comparing the 

15 results to a control. For example, in one embodiment, the methods of the present 
invention can be carried out using a control protein which is not a DNA binding 
protein. In one embodiment, immunoprecipitation is performed using an antibody 
against an HA or MYC epitope tag. The results of immunoprecipitating tlie protein 
of interest containing the tag, and the protein of interest without the tag are 

20 compared. The untagged protein should not be immunoprecipitated, and thus, 
serves as a negative control. 

As described in the exemplification, a particular embodiment of the present 
invention comprises the combined use of Chromatin Immunoprecipitation (ChIP) 
and Genome-wide expression monitoring microarrays. Chromatin 

25 immunoprecipitation allows the detection of proteins that are bound to a particular 
region of DNA. It involves four steps: (1) formaldehyde cross-linking proteins to 
DNA in living cells, (2) disrupting and then sonicating the cells to yield small 
fragments of cross-linked DNA, (3) immunoprecipitating the protein-DNA 
crosslinks using an antibody which specifically binds the protein of interest, and (4) 

30 reversing the crosslinks and amphfying the DNA region of interest using the 

Polymerase Chain Reaction (PCR). Analysis of the PCR product yield compared to 
a non-immunoprecipitated control determines whether the protein of interest binds 
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to the DNA region tested. However, each region of DNA must be tested 
individually by PGR. Thus, the ChIP technique is limited to the small set of DNA 
regions that are chosen to be tested. 

In contrast, the present method is not limited to amphfying individual DNA 
5 regions by performing PGR with specific primers. Rather the entire genome is 
amplified using a Ligation-mediated PGR (LMPGR) strategy. The amplified DNA 
was fluorescently labeled by including fluorescently-tagged nucleotides in the LM- 
PGR reaction. Finally, the labeled DNA was hybridized to a DNA microarray 
containing spots representing all or a subset (e.g., a chromosome or chromosomes) 

10 of the genome. The fluorescent intensity of each spot on the microarray relative to a 
non-immunoprecipitated control demonstrated whether the protein of interest bound 
to the DNA region located at that particular spot. Hence, the methods described 
herein allow the detection of protein-DNA interactions across the entire genome. 

In particular, DNA microarrays consisting of most of yeast chromosome in 

15 plus approximately 15 model genes whose expression have been well studied were 
constructed. These arrays were used in conjunction with the GhIP technique to 
study the DNA-binding properties of transcription factors and the transcription 
apparatus genome-wide. The methods described herein provide insights into the 
mechanism and regulation of gene expression in eukaryotic cells. 

20 The genome-wide location analysis method described herein allows protein- 

DNA interactions to be monitored across the entire yeast genome and is diagramed 
in Figure 1 . The method combines a modified Chromatin hnmuno^recipitation 
(GhIP) procedure, which has been previously used to study in vivo protein-DNA 
interactions at one or a small number of specific DNA sites, with DNA microarray 

25 analysis. Briefly, cells are fixed with formaldehyde, harvested by sonication, and 
DNA fragments that are crosslinked to a protein of interest are enriched by 
immunoprecipitation with a specific antibody. After reversal of the crosslinking, the 
eruiched DNA is ampUfied and labeled with a fluorescent dye using hgation- 
mediated PGR (LM-PGR). A sample of DNA that has not been enriched by 

30 immvmoprecipitation is subjected to LM-PGR in the presence of a different 
fluorophore, and both IP-enriched and imenriched pools of labeled-DNA are 
hybridized to a single DNA microarray containing all yeast intergenic sequences. 
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The IP-enriched/unenriched ratio of fluorescence intensity obtained jfrom three 
independent experiments can be used with a weighted average analysis method to 
calculate the relative binding of the protein of interest to each sequence represented 
on the array (see Figure 2). 
5 Four features of the global location profiUng method were found to be 

critical for consistent, high-quality results. First, DNA microarrays with consistent 
spot quality and even signal background play an obvious role. An example of an 
image generated by the technique described herein is shown in figure 5 A. Second, 
the LM-PCR method described herein was developed to permit reproducible 

10 amplification of very small amounts of DNA; signals for greater than 99.9% of 

genes were essentially identical within the error range when independent samples of 
1 ng of genomic DNA were amplified with the LM-PCR method (Figure 5B). 
Third, each experiment was carried out in tripUcate, allowing an assessment of the 
reproducibility of the binding data. And fotuih, a single-array error model described 

15 by Hughs et al, (2000) was adopted to handle noise associated witli low intensity 
spots and to average repeated experiments with appropriate weights 

The quantitative amplification of small amount of DNA generates some 
uncertainty for the low intensity spots. In order to track that uncertainty and to be 
able to average repeated experiments with appropriate related weights, we adopted 

20 an single-array error model that was first described by Hughs et al, (2000). 
According to this error model, the significance of a measured ratio at a spot is 
defined by a statistic X, which takes the fom 

X=(ao - aO/[oi^ + a/ + f (a^^ + (1) 
where aj 2 are the intensities measured in the two channels for each spot, 2 are the 

25 uncertainties due to background subtraction, and f is a fractional multiplicative error 
such as would come from hybridization non-uniformities, fluctuations in the dye 
incorporation efficiency, scanner gain fluctuations, ets. X is approximately normal. 
The parameters o and f were chosen such that X has unit variance. The significance 
of a change of magnitude |xl is then calculated as 

30 p=2x(l-Erf(|X|)). (2) 

While this invention has been particularly shown and described with 
references to preferred embodiments thereof, it will be understood by those skilled 
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in the art that various changes in form and details may be made therein without 
departing from the scope of the invention encompassed by the appended claims. 

EXEMPLIFICATION 

Example 1 DESIGN OF YEAST CHROMOSOME HI AND SELECTED 
5 MODEL GENES ARRAY FOR THE CHARACTERIZATION OF 

PROTEIN-DNA INTERACTIONS 
Array contains all non-overlapping open reading frames (ORF) on 
Chromosome III (See the Table). When a sequence contains part or all of two 
potential reading frames, the larger sequence was chosen to represent the ORF. 
10 Any remaining sequence was included in intergenic fragments. 

All intergenic regions larger than lOObp are represented by fragments 
averaging 500bp. Where regions are greater than 700bp, they are broken into 
multiple fragments of 300 to 600bps. PCR primers for each region were chosen 
using the Saccharomyces Genomic Database (SGD) "Design Primers'* program from 
15 Stanford University. The total number of intergenic fragments equals 241 for 
Chromosome III. 

The location and size of open reading frames were determined from the 
Saccharomyces Genomic Database (SGD) ftmctional chromosomal map. 

An additional 17 model genes (see tlie Table) were selected based on their 
20 high frequency of citation in transcription literature. Each gene was amplified as 
well as l-2k:b upstream and 500bp downstream of the coding region. 

Chip - Microarray Protocols 
PCR generation of unmodified yeast ORF DNA 
100 |il reaction generally yields approximately 5-6|ig DNA 

25 RXN mix: 

10.0 \i\ lOX PCR buffer (Perkin Ehner, AmpliTaq) 
8.0 ^il 25mM MgC12 (Perkin Ehner, AmpUTaq) 
10.0 \i\ lOX dNTPs (2mM each, Pharmacia lOOmM stocks) 
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1.0-2.0 [il ORF DNA (Research Genetics, approximately 10 ng) 
2.5p.l each universal primer (Research Genetics, 20 jiM solution) 
1.6 \il diluted Pfu DNA polymerase (diluted 1:100 in water, Strategene, 0,02U) 
1.0 |xl AmpliTaq DNA polymerase (5U, Perldn Ehner) 
5 63.4 \il ddH^O 

PGR Generation of Yeast Intergenic regions 

100 \il reaction generally yields approximately 5-6 ug DNA 

RXN mix: 

10.0 III lOX PGR buffer (Perkin Ehner, AmpliTaq) 
10 8.0 |xl 25mM MgC12 (Perkin Ehner, AmpliTaq) 

10.0 |jil lOX dNTPs (2mM each, Pharmacia lOOmM stocks) 
l.O\il Yeast Genomic DNA (Research Genetics, approximately 100 ng) 
5.0 [il each primer (Research Genetics, 20 [iM solution) 
1 .6 jil diluted Pfii DNA polymerase (diluted 1 : 100 in water, Strategene 0,02U) 
15 1 . Otil AmpUTaq DNA polymerase (5U, Perkin Ehner) 
58.4 (il ddHjO 

Cycling for ORF and intergenic DNA 
95°C 3 min 
30 cycles of: 
20 94°C 30 sec 
eO'^^C 30 sec 
- 72°C 2 min 

PGR Cleanup: 

Reactions were cleaned by Qiagen QIAquick 96 PCR purification kits 
25 according to the manufacturers' protocol with the following exception. DNA was 
eluted with 120 m-I of T.E. 8.0 (lOmmTris, 1mm EDTA, pH8.0). T.E. 8.0 was 
applied to the Qiagen membrane and allowed to sit 5 minutes before elution. The 
DNA was collected into a Coming polypropylene 96 well plate. 
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Reactions were quantified by visualizing Ijtl of the purified DNA on an 
agarose gel compared to a known quantity of lambda DNA cut with Hindin 
(Promega). 

DNA was stored at -20 until shortly before printing. The DNA was then 
5 dried down by speed vac in the Coming microtiter plates to less than 5|iL 

PRINTING 

PGR reactions were resuspended to approximately 0.5 mg/ml in 3XSSC. 
SSC was made as a 20X stock (3M NaCI, 0.3M Na3citrate-2H20, pH*d to 7.0 with 
HCl) and diluted to the desired concentration with H2O, 
10 10-15 |xl of the DNA was placed m a Coming 96 or 384 well plate and 

GAPS coated slides were printed using the Cartessiaii Robot. PCR products should 
be greater than 250 pb. 



Slide Processing 

1 . Rehydrated arrays by holding slides over a dish of hot ddHjO (-10 sec). 
15 2. Snap-dried each array (DNA side up) on a lOO^C hot plate for - 3 seconds. 

3. UV X-linked DNA to the glass by using a Stratalinker set for 60 m Joules. 

4. Dissolved 5g of succinic anhybride (Aldrich) in 315mL of n-methyl- 
pyrriUdinone. 

5. To this^ added 35mL of 0.2M NaBorate pH 8.0, and stirred until dissolved 
20 (Boric Acid pH'd with NaOH). 

6. Soaked arrays in this solution for 15 minutes with shaking. 

7. Transferred arrays to 95*^C water bath for 2 minutes. 

8. Quickly transferred, arrays to 95% EtOH for 1 mmute. 

9. Air dried sUdes array side up at a shght angle (close to vertical). 



25 Slide pre-hybridization 

1. Incubated slide in 3.5XSSC, 0.1%SDS, lOmg/ml BSA (Sigma) in a Coplin 
jar for 20 minutes at 50°C (Place Coplm jar in water bath). 

2. Washed slide by dipping in water and then isopropanol, 

3. Air dried array side up at shght angle (close to vertical). 
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Probe preparation 

1 . The probe volume should be 20-30 \il for a small covershp (25 mm^) and 40- 
60 |il for a large cover slip (24 x 60 mm). 

2. Brought probe (cDNA or PGR based) up to final hyb volume in 3XSSC, 
5 0.1% SDS with 10 ^g £. coli tRNA (Boehringer-Mannheim). 

3 . Boiled in heat block for 3-5 minutes. 

4. Snaped cool on ice. And spun. 

Hybridization 

1 . Pipetted probe onto slide. Dropped cover slip onto liquid avoidiag bubbles. 
10 2. Assembled over 50°C waterbath in hybridization chamber. Clamped shut. 

3 . Submerged in 50°C waterbath overnight. 

Scanning 

1 . Dissambled hybridization right side up. 

2. Removed coverslip with fingers or tweezers, 

15 3. Placed in O.IX SSC, 0.1% SDS at room temperature for 5-10 minutes. 

4. Transfered sUdes to O.IX SSC for 2.5 minutes and again for 2.5 minutes. 

5. Blew dry and scan slide. 

Data Analysis 

The data generated from scanning was analyzed using the ImaGene software. 
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The Table 



Yeast ORF 




Model Genes 




YCLOOlw 


RERl 


YOL086C 


ADHl 


YCLOOlw-a 




YBRUSc 


LYS2 


YCL002C 




YBR039C 


PH05 


YCL004W 


PGSl 


YIR019C 


FLO 11 


YCLOOSw 




YDL215C 


GDH2 


YCL006C 




YER103W 


SSA4 


YCL007C 


CWH36 


YHR053C 


CUPl 


YCLOOSc 


STP22 


YKL178C 


STE3 


YCL009C 


]LV6 


YIL163C 


SUC2 


YCLOlOc 




YOR202W 


fflSS 


YCLOllc 


GBP2 


YJR048W 


CYCl 


YCL012w' 




YIR153C 


INOl 


YCLOUw 


BUDS 


YBR020W 


GALl 


YCL016C 




YBR019C 


GALIO 


YCL017C 


NSFl 


YDL227C 


HO 


YCLOlSw 


LEU2 


YPL256C 


CLN2 


YCL019W 




YGRlOSw 


CLBl 


YCL020W 








YCL024W 








YCL025C 


AGPl 
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Yeast ORF 




Model Genes 




YCL026ca 


FRM2 






YCL027W 


FUSl 






YCL028W 








YCL029W 


BIKl 






YCL030C 


HIS4 






YCL031C 


RPB7 






YCL032W 


STE50 






YCL033C 








YCL034W 








YCL035C 








YCL036W 








YCL037C 


SR09 






YCL038C 








YCL039W 








YCL040W 


GLKl 






YCL041C 








YCL042W 








YCL043C 


PDIl 






YCL044C 








YCL045C 








YCL046W 








YCL047C 








YCL048W 








YCL049C 








YCLOSOc 


APAl 







Yeast ORF 




Model Genes 




YCL051W 


LREl 
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YCL052C 


PBNl 






YCL054W 








YCL055W 


KAR4 






YCL056W 








YCL057W 


PRDl 






YCL058C 








YCL059C 


KRRl 






YCL061C 








YCL063W 








YCL064C 


CHAl 






YCL065W 








YCL066W 


HMLALPHAl 






YCL067C 


HMLALPHA2 






YCL068C 








YCL069W 








YCL073C 








YCL074W 








YCL075W 








YCL076W 
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Yeast ORF 




Model Genes 




YCROOIW 








YYCR002C 


CDCIO 






YCR003W 


MRPL32 






YCR004C 


YCP4 






YCROOSc 


CIT2 






YCR006C 








YCR007C 








YCROOSw 


SAT4 






YCR009C 


RVS161 






YCROlOc 








YCROllc 


ADPl 






YCR012W 


PGKl 






YCR014C 


P0L4 






YCROlSc 








YCR016W 








YCR017C 








YCR018C 


SRDl 






YCROlSca 








YCR019W 








YCR020C 


PET 18 






YCR020CA 


MAK31 






YCR020wb 


HTLl 






YCR021C 


HSP30 







t 
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Yeast ORF 




Model Genes 




YCR022C 








YCR023C 








YCR024C 








YCR024CA 


PMPl 






YCR025C 








YCR026C 






■ 


YCR027C 








YCR028C 


FEN2 






YCR028CA 


RIMl 






YCR030C 








YCR031C 


RPS14A 






YCR032W 


BPHl 






YCR033W 








YCR034W 


FENl 






YCR035C 


RRP43 






YCR036W 


RBKl 






YOlOBTc 


PH087 






YCR038C 


BUDS 






YCR039C 


MATALPHA2 






YCR040W 


MATALPHAl 






YCR041W 








YCR042C 


TSMl 






YCR043C 
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Yeast ORF 




Model Genes 




YCR044C 








YCR045C 








YCR046C 


IMGl 






YCR047C 








YCR048W 


AREl 






YCR051W 








YCR052W 


RSC6 






YCR053W 


THR4 






YCR054C 


CTR86 






YCR057C 


PWP2 






YCR059C 








YCR060W 








YCR061W 








YCR063W 








YCR064C 








YCR065W 


HCMl 






YCR066W 


RAD18 






YCR067C 


SED4 






YCR068W 








YCR069W 


sees 






YCROTlc 


IMG2 






YCR072C 








YCR073C 


SSK22 







Yeast ORF 




Model Genes 




YCR073wa 


S0L2 






YCR075C 


ERSl 
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YCR076C 








YCR077C 


PATl 






YCR079W 








YCR081W 


SRB8 






YCR082W 








YCR083W 








YCR084C 


TUPl 






YCR085W 








YCR086W 








YCR087W 








YCR088W 


ABPl 






YCR089W 


FIG2 






YCR090C 








YCR091W 


KIN82 






YCR092C 


MSH3 






YCR093W 


CDC39 






YCR094W 


CDC50 






YCR095C 








YCR096C 


A2 






YCR097W 


Al 






YCR098C 


GITl 
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Yeast ORF 




Model Genes 




YCR099C 








YCRlOOc 








YCRlOlc 








YCR102C 








YCR102wa 








YCR103 








YCR104W 


PAU3 






YCR105W 








YCR106W 








YCR107W 


AAD3 
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Example 2 GENOME-WIDE LOCATION AND FUNCTION OF DNA- 

BINDING PROTEINS 

Global analysis of Gal4 binding sites 

To investigate the accuracy of the genome-wide location analysis method, 
the analysis was used to identify sites bound by the transcriptional activator Gal4 in 
the yeast genome. Gal 4 was selected because it is among the best characterized 
transcriptional activators, it is known to be responsible for induction of 10 genes 
necessary for galactose metaboUsm, and a consensus DNA binding sequence (the 
UASq) has been identified for Gal 4 in the promoters of the GAL genes. Very Uttle 
Gal 4 is bound at the UASg of the GALl and GALIO promoters when cells are 
grown m glucose (the repressed state), whereas relatively high levels of Gal 4 are 
bound in galactose (the activated state). 

The genome-wide location of epitope-tagged Gal4p in both glucose and 
galactose media was investigated in three independent experiments, as described in 
more detail below. The location analysis experiment identified seven genes 
previously reported to be regulated by Gal4 and three additional genes encoding 
activities that are physiologically relevant to cells that utilize galactose as the sole 
carbon source, but which were not previously known to be regulated by this 
activator (Figures 6A). 

The set of 24 genes whose promoter regions are most likely to be bound by 
Gal4 by the analysis criteria (p-value < 0.00001) described herein, is listed in Figure 
6A. Gal4 does not fimctionally activate all of these genes, however, since only a 
subset of the genes that share intergenic regions bound by Gal4 will be regulated by 
this activator (Figure 6B). To identify genes that are both bound by Gal 4 and 
activated by galactose, genome- wide expression analysis was carried out. The upper 
panel of Figure 6A shows genes whose expression is induced in galactose, whereas 
the lower panel shows genes whose expression is galactose independent. Seven 
genes previously reported to be regulated by Gal4 (GALl, GAL2, GALS, GAL7, 
GALIO, GAL80 and GCYl) boimd Gal4 and were activated in galactose. Three 
genes whose expression was not previously associated with the Gal 4 activator, 
MTH, PCLIO and FUR4, were also found to be bound by Gal4 and activated in 
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galactose. Substantially less Gal4 was associated with each of these promoters in 
cells grown in glucose, as expected. Gal4p was not bound to the promoters of GAL4 
and PGM2, genes previously thought to be regulated by Gal4, although direct 
evidence for Gal 4 binding to these promoters had not been demonstrated. Each of 
these results was confirmed by conventional ChIP analysis (Figure 6C), 
demonstrating that the microarray results accurately reflect results obtained by the 
conventional approach, which has until now been used to study binding sites 
individually. 

The ten genes that are both bound and regulated by Gal 4 were selected and 
the AlignAce program was used to identify a consensus binding site for this 
activator (Figure 6D). This blading site sequence is similar to, but refines, the 
sequence previously determined for Gal 4. The Gal 4 binding sequence occurs at 
approximately 50 sites through the yeast genome where Gal4 binding is not 
detected, indicating that the simple presence of this sequence is not sufficient for 
Gal 4 binding. 

Three genes whose expression was not previously associated with the Gal4 
activator, MTH, PCLIO and FUR4, were foimd to be bound by Gal 4 and activated 
in galactose. It is likely that these three genes are genuine Gal4p targets because 
they share the following three features with the well estabUshed Gal4-dependent 
GAL genes. MJif, PCLiO and Fi7i?4 are galactose-uiduced(Figixre6A), Galactose 
induction depends on Gal4 (Figure 6C). MTH, PCLIO and FUR4 promoters are 
bound by Gal 4 when cells are grown in galactose but not in glucose (Figure 6A). 
The binding of Gal4p to the MTH, PCLIO and FUR4 promoters was verified by 
conventional ChIP analysis (Figure 6C). 

The identification oiMTHl and MTH, PCLIO and FUR4 as Gal4-regulated 
genes reveals how regulation of several different metabolic pathways are 
interconnected (Figure 6F). MTHl encodes a transcriptional repressor of many 
genes involved in metabolic pathways that would be unnecessary when cells utilize 
galactose as a sole carbon source. Among the most interesting of its targets are a 
subset of the tflST genes involved in hexose transport. The results described herein 
indicate that the cell responds to galactose by modifying the concentration of its 
hexose transporters at the membrane in a Gal4-dependent fashion; Gal 4 activates 
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the galactose transporter gene GAL2 and, by activation of MTHl repressor, causes 
reduced levels of glucose transporter expression. The Pel 10 cyclin associates with 
Pho85p and appears to repress the formation of glycogen. The observation that 
PCLIO is Gal4-activated indicates that reduced glycogenesis occurs to maximize 
the energy obtained from galactose metabolism. FUR4 encodes a uracil pennease 
and its induction by Gal4 may reflect a need to increase intracellular pools of iiracil 
to permit efficient UDP addition to galactose catalyzed by Gal7. 

Previous studies have shown that Gal 4 binds to at least some GAL gene 
promoters when cells are grown on carbon sources other than galactose, as long as 
glucose is absent. Genome-wide location analysis of Gal 4 in cells grown on 
raffmose was repeated and it was found that the results were essentially identical to 
those obtained when cells were grown on galactose. These results indicate that 
Gal4 exliibits tlie same binding behavior at all its genomic binding sites and 
demonstrate that the genome- wide location method is highly reproducible. 

Global analysis of Stel2 binding sites 

The genome- wide binding profile of the DNA-binding transcription activator 
Stel2 was also investigated. Stel2 is of interest because it has a defmed cellular role 
- it is key to the response of haploid yeast to mating pheromones - but only a few 
genes regulated by Stel2 have been identified. Activation of the pheromone- 
response pathway causes cell cycle arrest and transcriptional activation of more than 
100 genes. Expression analysis using stel2 mutant cells has shown that Stel2 is 
required for the pheromone induction of all of these genes. However, the 
mechanism by which Stel2 activates transcription of these genes in response to 
pheromone has not been elucidated. 

The genome-wide location of epitope-tagged Stel2p before and after 
pheromone treatment was investigated in three independent experiments. The set of 
genes whose promoter regions are most likely to be bound by Ste 12 by the analysis 
criteria (p-value < 0.005) described herein is hsted m Figure 7; the upper panel 
shows genes whose expression is induced by alpha factor, whereas the lower panel 
shows genes whose expression is not significantly induced by alpha factor. Of the 
36 genes that are induced by alpha factor and are bound by Ste 12, 12 are known to 
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participate in various steps of the mating process (FIG2, AFRl, GIC2, STE12, 
CHSl, KAR5, FUSl, AGAl, FUS3, CIKl, FARl, FIGl). 

Stel2 binds to some promoters in the absence of pheromone signaling, 
however, its binding to most genes is enhanced by alpha factor. Interestingly, Stel2p 
is bound to its own promoter both before and after pheromone treatment. Together, 
the binding and expression data argue that the regulation of the STE12 gene involves 
a positive feedback loop. STE12 expression is increased immediately after 
pheromone treatment, indicating that the bound but inactive Stel2 activator is 
rapidly converted to an active form. Increased expression of STE12 gene would 
allow more Stel2p to be made and this would, in turn, activate its genes. 

Twenty-four genes whose expression were not previously associated with 
Stel2 and the mating process were foimd to be bound by Stel2 and activated by 
alpha factor. Considering that their pheromone induction is eliminated in stel2 
mutant cells, it is likely that these 24 genes are also genuine Stel2 targets. The 
identities of these genes indicate interesting details about various steps of the mating 
process. For example, one Stel2 target gene, PCL2, encodes a Gl cychn that forms 
complexes with the cyclin-dependent kinase (cdk) Pho85. The Pcl2-Pho85 and 
PCll-Pho85 complexes act in concert with Clnl-Cdc28 and Cln-2-Cdc28 cyclin 
dependent kinase complexes to promote Glcell cycle progression (Measday et al.^ 
1994). The Pcl2-Pho85 kinase complex has a substrate specificity that is 
overlapping but different from that of the Chil-Cdc28 and Chi2-Cdc28. During the 
mating process, haploid yeast cells are arrested at start of the late Gl phase, due to 
the uihibition of Cbil-Cdc28 and Cln2-Cdc28 activities by Farl, which is encoded by 
another Stel2 target gene. Activation of PCL2 by Stel2 after pheromone treatment 
indicates that increased Pho85 complex activities are likely necessary to compensate 
for the loss of Cdc28 activities. 

Most Stel2 target genes identified by analysis of genome locations of Stel2 
and expression profiles during pheromone induction encode proteins involved in 
various steps of the mating response. Among them are 1 1 previously 
imcharacterized. The cellular roles for these genes, including YNL279W, 
. YOR129C, YOR343C, YPL192C, YER019W, Y1L083C, YIL037C, YIL169C, 
YNL105W, YOL155C and YNR064C, are therefore most likely related to mating. 
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Stel2 has also been implicated in other cellular processes. Together with 
Tecl, Stel2 regulates the filmamentation of diploid cells and invasive growth in 
haploids. Two genes, TECl and FLOl 1 » have been identified as Stel2 targets in 
filamentous growth pathway. Stel2 binding to these genes either in the presence or 
absence of alpha factor was not detected. It is likely that Stel2p's binding to these 
promoters is regulated by different physiological conditions. 
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CLAIMS 

What is claimed is: 



1 A method of identifying a region of a genome of a cell to which a protein of 
interest binds, comprising the steps of: 
5 a) crosslinking DNA binding protein in the cell to genomic DNA of the 

cell, thereby producing DNA binding protein crosslinked to genomic 
DNA; 

b) generating DNA fragments of the genomic DNA crosslinlced to DNA 
binding protem in a), thereby producing a mixture comprising DNA 

10 fragments to which DNA binding protein is bound; 

c) removing a DNA fragment to which the protein of interest is bound 
from the mixture produced in b); 

d) separating the DNA fragment identified in c) from the protein of 
interest; 

15 e) amphfying the DNA fragment of d); 

f) combining the DNA fragment of e) with DNA comprising a sequence 
complementary to genomic DNA of the cell, under conditions in 
which hybridization between the DNA fragment and a region of the 
sequence complementary to genomic DNA occiirs; and 
20 g) identifying the region of the sequence complementary to genomic 

DNA of f) to which the DNA fragment hybridzes, 
whereby the region identified in g) is the region of the genome in the cell to 
which the protein of interest binds. 

2. The method of Claim 1 wherein the cell is a eukaryotic cell. 

25 3. The method of Claim 1 wherein the protein of interest is selected from the 

group consistuig of: a transcription factor and an oncogene. 



4. The method of Claim 1 wherein the DNA binding protein of the cell is 
crosslinked to the genome of the cell using formaldehyde. , 
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5. The method of Claim 1 wherein the DNA firagment of c) to which is bound 
the protein of interest is identified using an antibody which binds to the 
protein of interest. 

6. The method of Claim 1 wherein the DNA fragment of e) is amplified using 
5 Ugation-mediated poljonerase chain reaction. 

7. The method of Claim 1 wherein the complement sequence of the genome of 

* ■ 

f) is a DNA microarray. 

8. The method of Claim 1 further comprising: 

h) comparing the region identified in g) with a control. 

10 9. A method of identifying a region of a genome of a cell to which a protein of 

interest binds, comprising the steps of: 

a) formaldehyde crosslinking DNA binding protein in the cell to 

genomic DNA of the cell, thereby producing DNA binding protein 
crosslinked to genomic DNA; 
15 b) generating DNA firagments of the genomic DNA crossUnked to DNA 

binding protein in a), thereby producing DNA fragments to which 
DNA binding protein is bound; 

c) immunoprecipitating the DNA fragment produced in b) to wliich the 
protein of interest is bound using an antibody that specifically binds 

20 the protein of interest; 

d) separating the DNA fragment identified in c) from the protein of 
interest; 

e) amplifying the DNA fragment of d) using Ugation-mediated 
polymerase chain reaction; 

25 f) fluorescently labeling the DNA fragment of e); 

g) combining the labeled DNA fragment of e) with a DNA microarray 
comprising a sequence complementary to genomic DNA of the cell. 
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under conditions in which hybridization between the DNA fragment 
and a region of the sequence complementary to genomic DNA 
occurs; 

h) identifying the region of the sequence complementary to genomic 
5 DNA to which the DNA fragment hybridizes by measuring the 

fluoresc^ce intensity; and 

i) comparing the fluorescence intensity measured in h) to the 
fluorescence intensity of a control, 

whereby fluorescence intensity in a region of the genome which is greater 
10 than the fluorescence intensity of the control in the region indicates the 

region of the genome in the cell to which the protein of interest binds. 

10. A method of determining a function of a protein of interest which binds to a 
genome of a cell, comprising the steps of: 

a) crosslinking DNA binding protein in the cell to genomic DNA of tlie 
15 cell, thereby producing DNA binding protein crosslinked to genomic 

DNA; 

b) generating DNA fragments of the genomic DNA crosslinked to DNA 
binding protein in a), thereby producing a mixture comprising DNA 
fragments to which DNA binding protein is bound; 

20 c) removing the DNA fragment to which the protein of interest is bound 

from the mixture produced in b); 

d) separating the DNA fragment identified in c) from die protein of 
interest; 

e) amplifying the DNA fragment of d) ; 

25 f) combining the DNA fragment of e) with DNA comprising a sequence 

complementary to genomic DNA of the cell, under conditions in 
which hybridization between the DNA fragment and a region of the 
sequence complementary to genomic DNA occurs; 

g) identifying the region of the sequence complementary to genomic 
30 DNA of f) to which the DNA fragment hybridzes; and 

h) characterizing the region identified in g), 
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wherein the characteristics of the region of h) indicates a function of the 
protein of interest which binds to the genome of the cell. 

11. A method of determining whether a protein of interest which binds to the 
genome of a cell functions as a transcription factor, comprising the steps of: 
5 a) crosslinking DNA binding protein in the cell to the genomic DNA of 

the cell, thereby producing DNA binding protein crossUnked to 
genomic DNA; 

b) generating DNA fragments of the genomic DNA crosslinked to DNA 
binding protein in a), thereby producing a mixture comprising DNA 

1 0 fragments to which DNA binding protein is bound; 

c) removing the DNA fragment to which the protein of interest is bound 
from the mixture produced in b); 

d) separating the DNA fragment identified in c) from the protein of 
interest; 

15 e) amplifying the DNA fragment of d); 

f) combining the DNA fragment of e) with DNA comprising a sequence 
complementary to genomic DNA of the cell, under conditions in 
which hybridization between the DNA fragment and a region of the 
sequence complementary to genomic DNA occurs; and 
20 g) identifying the region of the sequence complementary to genomic 

DNA of f) to which the DNA fragment hybridzes, 
wherein if the region of the sequence complementary to genomic DNA of g) 
is a regulatory region, then the protein of interest is a transcription factor. 
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